The shift of public debate to the digital sphere has been accompanied by a rise in online hate speech. While many promising approaches for hate speech classification have been proposed, studies often focus only on a single language, usually English, and do not address three key concerns: post-deployment performance, classifier maintenance and infrastructural limitations. In this paper, we introduce a new human-in-the-loop BERT-based hate speech classification pipeline and trace its development from initial data collection and annotation all the way to post-deployment. Our classifier, trained using data from our original corpus of over 422k examples, is specifically developed for the inherently multilingual setting of Switzerland and outperforms with its F1 score of 80.5 the currently best-performing BERT-based multilingual classifier by 5.8 F1 points in German and 3.6 F1 points in French. Our systematic evaluations over a 12-month period further highlight the vital importance of continuous, human-in-the-loop classifier maintenance to ensure robust hate speech classification post-deployment.
translated by 谷歌翻译
Active learning as a paradigm in deep learning is especially important in applications involving intricate perception tasks such as object detection where labels are difficult and expensive to acquire. Development of active learning methods in such fields is highly computationally expensive and time consuming which obstructs the progression of research and leads to a lack of comparability between methods. In this work, we propose and investigate a sandbox setup for rapid development and transparent evaluation of active learning in deep object detection. Our experiments with commonly used configurations of datasets and detection architectures found in the literature show that results obtained in our sandbox environment are representative of results on standard configurations. The total compute time to obtain results and assess the learning behavior can thereby be reduced by factors of up to 14 when comparing with Pascal VOC and up to 32 when comparing with BDD100k. This allows for testing and evaluating data acquisition and labeling strategies in under half a day and contributes to the transparency and development speed in the field of active learning for object detection.
translated by 谷歌翻译
Current state-of-the-art deep neural networks for image classification are made up of 10 - 100 million learnable weights and are therefore inherently prone to overfitting. The complexity of the weight count can be seen as a function of the number of channels, the spatial extent of the input and the number of layers of the network. Due to the use of convolutional layers the scaling of weight complexity is usually linear with regards to the resolution dimensions, but remains quadratic with respect to the number of channels. Active research in recent years in terms of using multigrid inspired ideas in deep neural networks have shown that on one hand a significant number of weights can be saved by appropriate weight sharing and on the other that a hierarchical structure in the channel dimension can improve the weight complexity to linear. In this work, we combine these multigrid ideas to introduce a joint framework of multigrid inspired architectures, that exploit multigrid structures in all relevant dimensions to achieve linear weight complexity scaling and drastically reduced weight counts. Our experiments show that this structured reduction in weight count is able to reduce overfitting and thus shows improved performance over state-of-the-art ResNet architectures on typical image classification benchmarks at lower network complexity.
translated by 谷歌翻译
Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
translated by 谷歌翻译
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I/
translated by 谷歌翻译
A grand goal in deep learning research is to learn representations capable of generalizing across distribution shifts. Disentanglement is one promising direction aimed at aligning a models representations with the underlying factors generating the data (e.g. color or background). Existing disentanglement methods, however, rely on an often unrealistic assumption: that factors are statistically independent. In reality, factors (like object color and shape) are correlated. To address this limitation, we propose a relaxed disentanglement criterion - the Hausdorff Factorized Support (HFS) criterion - that encourages a factorized support, rather than a factorial distribution, by minimizing a Hausdorff distance. This allows for arbitrary distributions of the factors over their support, including correlations between them. We show that the use of HFS consistently facilitates disentanglement and recovery of ground-truth factors across a variety of correlation settings and benchmarks, even under severe training correlations and correlation shifts, with in parts over +60% in relative improvement over existing disentanglement methods. In addition, we find that leveraging HFS for representation learning can even facilitate transfer to downstream tasks such as classification under distribution shifts. We hope our original approach and positive empirical results inspire further progress on the open problem of robust generalization.
translated by 谷歌翻译
小组讨论的《冲动讲座》的语音手稿(德语 +英语)“梅机器(可以)想到?”在2022年5月28日在斯图加特的第102天天主教日。小组:Winfried Kretschmann(MDL,总理Baden-W \“ Urttemberg,Stuttgart),Ursula Nothehelle Wildfeuer(Freiburg),Michael Resch(Stuttgart),Karsten Wendland(Aalen)(Aalen)。 :Verena Neuhausen(Stuttgart)。
translated by 谷歌翻译
基于代理的深度度量学习(DML)通过将图像嵌入与班级代表接近的图像(通常相对于它们之间的角度)来学习深度表示。但是,这无视嵌入规范,该规范可以带有其他有益的环境,例如类或图像 - 内在不确定性。此外,基于代理的DML努力学习课堂内部结构。为了立即解决这两个问题,我们引入了基于概率的非各向异性概率代理DML。我们将图像模拟为高超球的定向von mises-fisher(VMF)分布,可以反映图像内部不确定性。此外,我们为类代理提供了非异向von mises-fisher(NIVMF)分布,以更好地表示复杂的类别特异性方差。为了衡量这些模型之间的代理到图像距离,我们开发并研究了多个分布到分配和分布指标。每种框架选择都是由一系列消融研究激励的,这些研究展示了我们对基于代理的DML的概率方法的有益特性,例如不确定性意识,在培训期间较好的梯度以及总体改善的概括性能。后者尤其反映在标准DML基准测试中的竞争性能中,我们的方法可以进行比较,这表明现有的基于代理的DML可以从更概率的治疗中受益匪浅。代码可在github.com/explainableml/probabilistic_deep_metric_learning上找到。
translated by 谷歌翻译
我们介绍了联合多维缩放,这是一种无监督的歧管比对的新方法,该方法从两个不同的域中映射数据集,没有数据集中的数据实例之间没有任何已知的对应关系,以绘制到一个常见的低维欧几里得空间。我们的方法集成了多维缩放(MDS)和Wasserstein Procrusteres分析成一个关节优化问题,以同时生成数据的等距嵌入数据,并从两个不同数据集中学习实例之间的对应关系,而仅需要内部范围内的成对差异差异作为输入。这种独特的特征使我们的方法适用于数据集,而无需访问输入功能,例如求解不精确的图形匹配问题。我们提出了一种交替优化方案,以解决可以完全受益于MDS和Wasserstein Procrustes的优化技术的问题。我们证明了方法在几种应用中的有效性,包括两个数据集的联合可视化,无监督的异质域的适应性,图形匹配和蛋白质结构比对。
translated by 谷歌翻译
频繁且与结构相关的子图(也称为网络基序)是许多图形数据集的宝贵特征。但是,在任意数据集中识别主题集的高计算复杂性(主题挖掘)限制了它们在许多现实世界数据集中的使用。通过自动利用数据集的统计属性,机器学习方法在具有组合复杂性的几个任务中显示出了希望,因此是网络基序挖掘的有前途的候选人。在这项工作中,我们试图促进针对图案采矿的机器学习方法的开发。我们建议将基序挖掘问题作为节点标记任务进行公式。此外,我们构建了基准数据集和评估指标,这些指标测试了模型捕获主题发现不同方面的能力,例如主题数,大小,拓扑和稀缺性。接下来,我们提出了Motifiesta,这是第一次以完全可区分的方式解决此问题的尝试,并在具有挑战性的基准方面有希望的结果。最后,我们通过Motifiesta证明,该学习设置可以同时应用于通用数据挖掘和用于图形分类任务的可解释功能提取。
translated by 谷歌翻译